Phonetic normalization using z-score in segmental prosody estimation for corpus-based TTS system

نویسندگان

  • Hoeun Song
  • Jaein Kim
  • Kyongrok Lee
  • Jinyoung Kim
چکیده

Recently, corpus-based text-to-speech (CB-TTS) has been actively studied through the world. Statistical training methods are generally applied for prosodic rules in CB-TTS, and classification and regression tree (CART) is one of the mostly used methods. In this paper, we present an efficient CART training approach of zscore based phonetic normalization. The idea of ours comes from the fact that the most important three parameters of CART training for segmental prosody are phone and its right and left phones, especially in Korean language. Our approach reduces the number of CART terminal nodes effectively. The reduction ratios are approximately 14-94% for estimation of segmental duration and 45-70% for intensity estimation. Also, the experimental results show that phonetic normalization slightly lessens the estimation errors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Implementing an SSML compliant concatenative TTS system

The W3C Speech Synthesis Markup Language (SSML) unifies a number of recent related markup languages that have emerged to fill the perceived need for increased, and standardized, user control over Text to Speech (TTS) engines. One of the main drivers for markup has been the increasing use of TTS engines as embedded components of specific applications – which means they are in a position to take ...

متن کامل

Unsupervised prosody labeling for constructing Mandarin TTS

This paper introduces an unsupervised prosody labeling method for preparing a large speech corpus used in developing a Mandarin Text-to-Speech system. Adopting a four-layer prosody hierarchy, the proposed method performs an unsupervised segmental clustering that iteratively segments spoken utterances into strings of prosodic constituents and models the patterns of the segmented prosodic constit...

متن کامل

Segment selection in the L&h Realspeak laboratory TTS system

The L&H RealSpeak Laboratory TTS (RSLab) system is a corpus based speech synthesis system comprising components that deal with linguistic processing, prosody prediction, segment selection, concatenation and modification. In this paper we focus on the segment selection process. During segment selection, the units in a large database of speech are scored with a cost according to their prosodic/ph...

متن کامل

A Quantitative Study on Information Contribution of Prosody Phrase Boundaries in Chinese Speech

In speech, acoustic cues are used to manifest a number of linguistic events including segmental phonemes and supra-segmental ones such as tones, prosodic phrasing structure, intonation, etc. It has been an interesting topic to quantitatively compare the importance of different linguistic events. However, previous studies have been mainly confined to segmental or segment-like units. No studies c...

متن کامل

A Corpus-Based Concatenative Speech Synthesis System for Turkish

Speech synthesis is the process of converting written text into machine-generated synthetic speech. Concatenative speech synthesis systems form utterances by concatenating pre-recorded speech units. Corpus-based methods use a large inventory to select the units to be concatenated. In this paper, we design and develop an intelligible and natural sounding corpus-based concatenative speech synthes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002